Shortest path distance in random /c-nearest neighbor graphs 



Morteza Alamgir 1 morteza@tuebingen.mpg.de 
Ulrike von Luxburg 1,2 ulrike.luxburg@tuebingen.mpg.de 

1 Max Planck Institute for Intelligent Systems, Tubingen, Germany 

2 Department of Computer Science, University of Hamburg, Germany 



(N 

o 

>—> 

o\ 

O 

(N 
> 

00 

m 
o 

(N 

> 
• i-H 

X 



Abstract 

Consider a weighted or unweighted fc-nearest 
neighbor graph that has been built on n data 
points drawn randomly according to some 
density ponl''. We study the convergence of 
the shortest path distance in such graphs as 
the sample size tends to infinity. We prove 
that for unweighted kNN graphs, this dis- 
tance converges to an unpleasant distance 
function on the underlying space whose prop- 
erties are detrimental to machine learning. 
We also study the behavior of the shortest 
path distance in weighted kNN graphs. 



1. Introduction 

The shortest path distance is the most fundamental 
distance function between vertices in a graph, and it is 
widely used in computer science and machine learning. 
In this paper we want to understand the geometry in- 
duced by the shortest path distance in randomly gener- 
ated geometric graphs like fc-nearest neighbor graphs. 

Consider a neighborhood graph G built from an i.i.d. 
sample X±, ...,X n drawn according to some density p 
on X C R d (for exact definitions see Section [2]) . As- 
sume that the sample size n goes to infinity. Two 
questions arise about the behavior of the shortest path 
distance between fixed points in this graph: 

1. Weight assignment: Given a distance measure D 
on X ', how can we assign edge weights such that the 
shortest path distance in the graph converges to D? 

2. Limit distance: Given a function h that assigns 
weights of the form /i(||Xj— Xj\\) to edges in G, what is 
the limit of the shortest path distance in this weighted 
graph as n — > 00? 



The first question has already been studied in some 



special cases. Tenenbaum et al. (2000) discuss the case 



of e- and kNN graphs when p is uniform and D is the 
geodesic distance. Sajama & Orlitskyl d2005b extend 



these results to e-graphs from a general density p by 
introducing edge weights that depend on an explicit es- 
timate of the underlying density. In a recent preprint, 



Hwang & Hero ( 2012J) consider completely connected 
graphs whose vertices come from a general density p 
and whose edge weights are powers of distances. 

There is little work regarding the second question. 
Tenenbaum et al.| ( |2000 ) answer the question for a very 
special case with h{x) = x and uniform p. Hwang & 
Hero (2012) study the case h(x) = x a , a > 1 for arbi- 
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trary density p. 

We have a more general point of view. In Section [4] 
we show that depending on properties of the function 
h(x), the shortest path distance operates in different 
regimes, and we find the limit of the shortest path 
distance for particular function classes of h(x). Our 
method also reveals a direct way to answer the first 
question without explicit density estimation. 

An interesting special case is the unweighted kNN 
graph, which corresponds to the constant weight func- 
tion h(x) = 1. We show that the shortest path dis- 
tance on unweighted kNN-graphs converges to a limit 
distance on X that does not conform to the natural 
intuition and induces a geometry on X that can be 
detrimental for machine learning applications. 

Our results have implications for many machine learn- 
ing algorithms, see Section [5] for more discussion. (1) 
The shortest paths based on unweighted kNN graphs 
prefer to go through low density regions, and they even 
accept large detours if this avoids passing through high 
density regions (see Figure [T] for an illustration) . This 
is exactly the opposite of what we would like to achieve 
in most applications. (2) For manifold learning algo- 
rithms like Isomap, unweighted kNN graphs introduce 
a fundamental bias that leads to huge distortions in 
the estimated manifold structure (see Figure § for an 
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Figure 1. The shortest path based on an unweighted (red) 
and Euclidean weighted (black) kNN graph. 




satisfies — Xj\\ < e. In this paper, all graphs 
are undirected, but might carry edge weights Wij > 0. 
In unweighted graphs, we define the length of a path 
by its number of edges, in weighted graphs we define 
the length of a path by the sum of the edge weights 
along the path. In both cases, the shortest path (ST) 
distance D sp (x, y) between two vertices x, y € V is the 
length of the shortest path connecting them. 

Let / be a positive continuous scalar function defined 
on X. For a given path 7 in X that connects x with 
y and is parameterized by t, we define the /-length of 
the path as 



D 



/(7(t))|Y(t)|d*. 



Figure 2. Original data (left) and its Isomap reconstruction 
based on an unweighted kNN graph (right). 

illustration). (3) In the area of semi-supervised learn- 
ing, a standard approach is to construct a graph on the 
sample points, then compute a distance between ver- 
tices of the graph, and finally use a standard distance- 



This expression is also known as the line integral along 
7 with respect to /. The /-geodesic path between x 
and y is the path with minimum /-length. 

The /-length of the geodesic path is called the /- 
distance between x and y. We denote it by Df(x,y). 
If f(x) is a function of the density p at x, then the /- 
distance is sometimes called a density based distance 



based classifier to label the unlabeled points (e.g., Sa- (Sajama & Orlitsky 20051 



jama fc Orlitsky 2005 and Bijral et al. 2011). The 



crucial property exploited in this approach is that dis- 
tances between points should be small if they are in 
the same high-density region. Shortest path distances 
in unweighted kNN graphs and their limit distances 
do exactly the opposite, so they can be misleading for 
this approach. 

2. Basic definitions 

Consider a closed, connected subset X C R d that 
is endowed with a density function p with respect 
to the Lebesgue measure. For the ease of presen- 
tation we assume for the rest of the paper that the 
density p is Lipschitz continuous with Lipschitz con- 
stant L and bounded away from by p m i n > 0. To 
simplify notation later on, we define the shorthand 
q(x) := (p{x)f/ d . 

We will consider different metrics on A\ A ball with 
respect to a particular metric D in X will be written 
as B(x, r, D) := {y e X \ D(x, y) < r}. We denote the 
Euclidean volume of the unit ball in M. d by 77^. 

Assume the finite dataset X\, X n has been drawn 
i.i.d according to p. We build a geometric graph G = 
(V, E) that has the data points as vertices and connects 
vertices that are close. Specifically, for the kNN graph 
we connect Xi with Xj if Aj is among the k nearest 
neighbors of Xj or vice versa. For the e-graph, we 
connect A,- and Xj whenever their Euclidean distance 



The /-distance on A" is a metric, and in particular it 
satisfies the triangle inequality. Another useful prop- 
erty is that for a point u on the /-geodesic path be- 
tween x and y we have Df(x, y) = Df(x, u) + Df(u, y). 

The function / determines the behavior of the /- 
distance. When f(x) is a monotonically decreasing 
function of density p{x), passing through a high den- 
sity region will cost less than passing through a low 
density region. It works the other way round when / 
is a monotonically increasing function of density. A 
constant function does not impose any preference be- 
tween low and high density regions. 

The main purpose of this paper is to study the rela- 
tionship between the SV distance in various geomet- 
ric graphs and particular /-distances on X. For ex- 
ample, in Section [3] we show that the SV distance in 
unweighted kNN graphs converges to the /-distance 
with f(x) =p{xyl d . 

In the rest of the paper, all statements refer to points 
x and y in the interior of X such that their /-geodesic 
path is bounded away from the boundary of X. 

3. Shortest paths in unweighted graphs 

In this section we study the behavior of the shortest 
path distance in the family of unweighted kNN graphs. 
We show that the rescaled graph SV distance con- 
verges to the g-distance in the original space X. 
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Theorem 1 (SV limit in unweighted kNN graphs) 

Consider the unweighted kNN graph G n based on the 
i.i.d. sample Xi,...,X n G X from the density p. 
Choose A and a such that 



A> 



lid 1+1 Id \ n 
V d P mi „ " 



< 1 - loj 



Ik (4 d (l + A)' 



Fix two points x = Xj and y = Xj. Then there 
exist ei(A, fc),e2(A, fc, n),e3(A) (see below for explicit 
definitions) such that with probability at least 1 — 
3e3?i exp(— \ 2 k a /6) we have 

eiD q {x,y) < e 2 D sp (x,y) < D g (x,y) - e 2 . 

Moreover if n —¥ oo, k -+ oo ; k/n ~+ 0, A —¥ and 
A 2 fc a /log(n) — > oo, then the probability converges to 
1 and (k/(r]dn)) 1 ^ d D S p(x,y) converges to D q (x,y) in 
probability. 

The convergence conditions on n and k are the ones to 
be expected for random geometric graphs. The con- 
dition A 2 fc a /log(n) — > oo is slightly stronger than the 
usual fc/log(n) — > oo condition. This condition is sat- 
isfied as soon as k is of the order a bit larger than 
log(n). For example k sw \og(n) 1+a with a small a 
will work. For k smaller than log(n), the graphs are 
not connected anyway (see e.g. Penrose| 1999 1 and are 



unsuitable for machine learning applications. 

Before proving Theorem [TJ we need to state a couple 
of propositions and lemmas. We start by introducing 
some ad-hoc notation: 

Definition 2 (Connectivity parameters) Con- 
sider a geometric graph based on a fixed set of points 
X\, ...,X n G R d . Let ri ow be a real number such that 
Df(Xi,Xj) < ri ow implies that JQ is connected to Xj 
in the graph. Analogously, consider r up to be a real 
number such that Dt(Xi,Xj) > r up implies that Xj is 
not connected to Xj in the graph. 

Definition 3 (Dense sampling assumption) Con- 
sider a graph G with connectivity parameters ri ow and 
Tup- We say that it satisfies the dense sampling as- 
sumption if there exists an ^ < ri ow /A such that for 
all x G X there exists a vertex y in the graph with 
Df(x,y) < <;. 

Proposition 4 (Bounding D sp by Df) Consider 
any unweighted geometric graph based on a fixed set 
X\. X n G X C M. d that satisfies the dense sampling 
assumption. Fix two vertices x and y of the graph 
and set 




Figure 3. Path constructions in the proofs of Proposition [4] 
(top) and Theorem |9] (bottom) . 



Then the following statement holds: 

eiD f {x, y) < e 2 D sp {x, y) < D f (x, y) - e 2 . 

Proof. Right hand side. Consider the /-geodesic 
path 7* connecting x to y. Divide 7* to segments 
by u = x, ui, ...,u t , u t +i = y such that D f (u u u i+1 ) = 
n ow -2<; for i = 0, ...,t-l and Df{u t , Ut+i) < n ow -2<; 
(see Figure l3j). Because of the dense sampling assump- 
tion, for all i — 1, ...,t there exists a vertex in the 
ball B(iii,<;; Df) and we have 

D f (vi,Ui) < <r 
Df(ui,u i+ i) < ri ow -2q 
D f (u i+1 ,v i+1 ) < c 

Applying the triangle inequality gives Df(vi, Vi + 1) < 
Tiowt which shows that Vi and w^+i are connected. By 
summing up along the path we get 

(riow ~ 2s)(D sp (x, y) - 1) < (r low - 2<;)t 



t-i 

E 

i=0 



(a) 

Df(ui,u i+1 ) < Df(x,y). 



In step (a) we use the simple fact that if u is on the 
/-geodesic path from x to y, then 

D f( x ,y) = D f(x,u) +D f (u,y). 

Left hand side. Assume that the graph SV between 
x and y consists of vertices zq = x, z\, z s = y. By 
Df(zi, Zi+x) < r up we can write 



{now - %)D sp (x,y) > 



flow t-» / > 

— E l= o D f {z t ,z i+1 ) 



> 



' up 

-2? 



D f (x,y). 



□ 



ei 



['low 



Up 



e-2 = r ic 



2c- 



The next lemma uses the Lipschitz continuity and 
boundedness of p to show that q(x)\\x — y\\ is a good 
approximation of D q (x,y) in small intervals. 
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Lemma 5 (Approximating D q in small balls) 

Consider any given A < 1. If \\x — y\\ < p m i n X/L then 
the following statements hold: 

1. We can approximate p(y) by the density at x: 

p(y)(l-X)<p(x)<p{y){l + \). 

2. We can approximate D q (x,y) by q(x)\\x — y\\: 
(l-\)V d q(x)\\x-y\\ < D q (x,y) < (l+\)V d q(x)\\x-y\\. 

Proof. Part (1). By the Lipschitz continuity of p for 
1 1 — 1 1 ^ $ we have 

\p{x) - p{y)\ < L\\x - y\\ < LS. 

Setting S = Xp mm /L leads to the result. 
Part (2). The previous part can be written as 

{l-X) l ' d q{x)<q{y)<{l + X) 1 ' d q{x). 

Denote the q-geodesic path between x and y by 7* 
and the line segment connecting x to y by /. Using the 
definition of a (/-geodesic path, we can write 

q(^(t))\ 7 *(t)'\dt < Jq(l{t))\l{t)'\dt< 
{1 + X) 1 ' d jq{x)\l{t)'\dt = {l + X)^ d q(x)\\x-y\\. 



Also, 



q(l*(t))\~f*(ty\dt > (l-A)W q(x)\j*(t)'\dt 



7* 



> (l-Xy/tqWWx-yl 



□ 



Now wg sire going to show how the quantities ti ow and 
r up introduced in Definition [2] can be bounded in ran- 
dom unweighted kNN graphs and how they are related 
to the metric D q . 

To this end, we define the kNN q-radii at vertex x as 
Rq.k(%) = D q (x,y) and the approximated kNN q-radii 
at vertex x as R q ^{x) = q{x)\\x — where y is the 
fc-nearest neighbor of x. The minimum and maximum 
values of kNN q-radii are defined as 



mmR q ^(u) , iCj* x = max.Rq,k{u). 



Accordingly we define i?™™ and R™% x for the approx- 
imated (/-radii. 

The following proposition is a direct adaptation of 



Proposition 6 (Bounding i?™" and R™% x ) 
Given X < 1/2 define ri ow and r up as 



Tic 



(1 + X)nrj d 



l/d 



(1 - X)nrj d 



l/d 



and radius fi ow and r up as 



rio 



(l + \)i/d ' (l-A)Vd' 
Assume that f up < Xp^^ d /L. Then 

P(R™k < n ow ) < nexp(-A 2 fc/6) 



P 



(i?™r>^ P ) <nexp(-A 2 fc/6). 



Proof. Consider a ball B x with radius r"i ow /q(x) 
around x. Note that fi ow /q(x) < p mm X/L , so we 
can bound the density of points in B x by (1 + X)p(x) 
using Lemma [5] Denote the probability mass of the 
ball by (J,(x), which is bounded by 



p(s)ds < (1 + X)p(x) / ds 
= (1 + tyr&wVd ='■ Me 



Observe that R q ^k(x) < fi ow if and only if there are at 
least k data points in B x . Let Q ~ Binomial(n, fi(x)) 
and S <~*j Binomial(n, /x max ). By the choice of f; oto we 
have E(S) = fc/(l + A). It follows that 

p(R q Ax) < how) = p(q > fc) < p(s > k) 

= P(S>{1 + X)E(S)). 
Now we apply a concentration inequality for binomial 



random variables (see Prop. 28 in von Luxburg et al. 



2010 ) and a union bound to get 

(#™f < hov) < p(li : Rq,k( X i) < riow) 

( -; 

< n exp — — 



p 



-A 2 fc 
3(iTa) 
< nexp(-A 2 fc/6). 



Proposition 31 from von Luxburg et al. (2010). 



By a similar argument we can prove the analogous 
statement for R™^°- Finally, Lemma [5] gives 

"> k ~ (1 + A)V«* ' q ' k ~ (1- A) 1 /"*" 

□ 

The following proposition shows how the sampling pa- 
rameter ? can be chosen to satisfy the dense sampling 
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(1 + A) 1 



assumption. Note that we decided to choose <; in a 
form that keeps the statements in Theorem [T] simple, 
rather than optimizing over the parameter <; to maxi- 
mize the probability of success. 

Lemma 7 (Sampling lemma) Assume Xi,...,X n 
are sampled i.i.d. from a probability distribution p and 
a constant a < 1 is given. Set X as in Theorem^ 

\r) d n) 

and 63(A) := 2 d /(l — A) 2 . Then with probability at 
least 1 — e3ncxp(— k a /6), for every x £ X exists a 
y £ X\, X n such that D q (x, y) < <;. 

Proof. Define <^ = (1 + A) -1 /^. We prove that for 
every x £ X , there exist a vertex y such that tjr(x)||a; — 
V 1 1 < £o- Then using Lemma [5] will give the result. 

The proof idea is a generalization of the covering ar- 
gument in the proof of the sampling lemma in |Tenen-| 
baum et al. (2000). We first construct a covering of 



X that consists of balls with approximately the same 
probability mass. The centers of the balls are chosen 
by an iterative procedure that ensures that no center 
is contained in any of the balls we have so far. We 
choose the radius ^/q{x) for the ball at point x and 
call it B q (x, so)- The probability mass of this ball can 
be bounded by 

Note that smaller balls B q (u, (1 - X) 1 / d <; /2) are all 
disjoint. To see this, consider two balls B q (x,(l — 
A) 1 /^ /2), B q (y, (1 - X) 1 /%/2). Observe that 



(f-A) 1 /^ , 



l/d. 



< 



2q{x) 2q{y) ~ q(x) ' 

We can bound the total number of balls by 
1 2 d 



S < 



< 



V(B g (x, (1 - A)V^ /2)) " % (1 - X)\ d 



Now we sample points from the underlying space and 
apply the same concentration inequality as above. We 
bound the probability that a ball B q (u,<;o) does not 
contain any sample point ("is empty") by 

Pr(Ball i is empty) < exp(— n<jg rjd/6). 

Rewriting and Substituting the value of <ro gives 

Pr(no ball is empty) > 1 — '}2 i Pr{Bi is empty) 

> 1 - S ■ e - n ^ rid/6 > 1 

-k a /6 



> 1 



ne 



(1-Xf 



1 — e-zne 



2 d ne~ ka / 6 
{1-X) 2 k a 

k a /6 



□ 

Proof of Theorem [T| Set ri ow and r up as in 

Proposition [6j The assumption on A ensures that 
r up < Xp^fa^/L. It follows from Proposition [fj] 
that the statements about ri ow and r up in Defini- 
tion [2] both hold for G n with probability at least 
/ii = 1 — 2nexp(— A 2 fc/6). Set as in Lemma and 
define the constant a < I — log fc ^4 d (l + X) 2 ^j . By this 

choice we have ri ow > 4^. Lemma [7] shows that the 
sampling assumption holds in G n for the selected ? 
with probability at least /i 2 = 1 — e3nexp(— k a /6). To- 
gether, all these statements about G n hold with prob- 
ability at least := 1 — "Se^n exp(— X 2 k a /6). 

Using Proposition[4]completes the first part of the the- 
orem. For the convergence we have 



ei 



^low 
T 



-2? 

up 



1 + x) ~ V fc 1 " 



This shows that e\ — > 1 as A — > and k — > 00. For 
A — > and k — > 00 we can set a to any constant 
smaller than 1. Finally it is easy to check that e 2 — > 
and s/riow -> 0. □ 

4. Shortest paths in weighted graphs 

In this section we discuss both questions from the In- 
troduction. We also extend our results from the pre- 
vious section to weighted kNN graphs and e-graphs. 

4.1. Weight assignment problem 

Consider a graph based on the i.i.d. sample 
Xi,...,X n £ X from the density p. We are given a 
positive scalar function / which is only a function of 
the density: f{x) — f{p(x)). We want to assign edge 
weights such that the graph SV distance converges to 
the /-distance in X. 

It is well known that the /-length of a curve 7 : [a, b] —> 
X can be approximated by a Riemann sum over a par- 
tition of [a, b] to subintervals [a;^, 35^4.1]: 



7(zi)+7pi+l) 
2 



h(xi) -7(^+1 ) II- 



As the partition gets finer, the approximation Df n 
converges to P/ i7 (cf. Chapter 3 of Gamelin 2007 1 . 
This suggests using edge weights 



= f[p( 



■X t +Xi 



))\\Xi-Xj 



However the underlying density p{x) is not known in 
many machine learning applications. Saja ma fe Orl-| 
itsky ( 2005 ) already proved that the plug-in approach 
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using a kernel density estimator p(x) for p(x) will lead 
to the convergence of the SV distance to /-distance 
in e-graphs. Our next result hows how to choose edge 
weights in kNN graphs without estimating the density. 
It is a corollary from a theorem that will be presented 
in Section IHH 



We use a notational convention to simplify our argu- 
ments and hide approximation factors that will even- 
tually go to 1 as the sample size goes to infinity. We 
say that / is approximately larger than g (/ )p\ g) if 
there exists a function e(A) such that / > e(X)g and 
e(A) — > 1 as n — > oo and A — > 0. The symbol is de- 
fined similarly. We use the notation / «a 9 if / =^A 9 
and / >p\ g. 

Corollary 8 (Weight assignment) Consider the 
kNN graph based on the i.i.d. sample X±,...,X n G X 
from the density p. Let f be of the form f(x) = f(p(x)) 
with f increasing. We assume that f is Lipschitz 
continuous and f is bounded away from 0. Define 
r = {k/{nrjd)) 1 ^ d and set the edge weights 



IX - X 



X 



x 3 \\ d 



(1) 



Fix two points x = Xj and y = Xj . Choose A and a 
as in Theorem [5| Then with probability at least 1 — 
3e3nexp(— X 2 k a /6) we have D sp (x,y) ~\ Df(x,y). 

4.2. Limit distance problem 

Consider a weighted graph based on the i.i.d. sample 
Xi,...,X n G X from the density p. We are given a 
increasing edge weight function h : R + —> R + which 
assigns weight h(\\x — y\\) to the edge (x,y). We are 
interested in finding the limit of the graph SV distance 
with respect to edge weight function h as the sample 
size goes to infinity. In particular we are looking for 
a distance function / such that the SV distance con- 
verges to the /-distance. 

Assume we knew the solution /* = f*(p(x)) of this 
problem. To guarantee the convergence of the dis- 
tances, /* should assign weights of the form of Wij ~ 
/* (p(X t )) ||Xf - Xj || . This would mean 



IIX-XJ 



which shows that determining /* is closely related to 
finding a density based estimation for ||Xj — Xj\\. 

Depending on h, we distinguish two regimes for this 
problem: subadditive and superadditive. 



4.2.1. Subadditive weights 

A function h(x) is called subadditive if Vx, y > : 
h(x) + h(y) > h(x + y). Common examples of subaddi- 
tive functions are f(x) = x a , a < 1 and f{x) = xe~ x . 
For a subadditive h, the SV in the graph will sat- 
isfy the triangle inequality and it will prefer jumping 
along distant vertices. Based on this intuition, we 
come up with the following guess for vertices along 
the SV: For e-graphs we have the approximation 
||Xj - Xj\\ fa e and f(x) = h(e)/e. For kNN-graphs 



we have ||X 
and 

f(x) = h{ 



X, 



r/q(Xi) with r = (k/(nn d )) 1 / d 



q(x) 



We formally prove this statement for kNN graphs in 
the next theorem. In contrast to Theorem [TJ the scal- 
ing factor is moved into /. The proof for e-graphs is 
much simpler and can be adapted by setting r = e, 
q(x) = 1, and r iow = r up = e. 

Theorem 9 (Limit of SV in weighted graphs) 

Consider the kNN graph based on the i.i.d. sample 
Xi,...,X n G X from the density p. Let h be an 
increasing, Lipschitz continuous and subadditive func- 
tion, and define the edge weights Wij = h(\\Xi — Xj\\). 
Fix two points x = X 
r = (k/(nr]d)) 1 ^ d and set 



and y 



Xj . Define 



q{x) r 

Choose A and a such that 

'Id Pmin 

Then with probability at least 1 — Se^n exp(— A 2 fc a /6) 
we have D sp (x,y) w A Df(x,y). 

Proof. The essence of the proof is similar to the 
one in Theorem [l] we present a sketch only. The 
main step is to adapt Proposition|4]to weighted graphs 
with weight function h. Adapting Lemma [5] for gen- 
eral / is straightforward. The lemma states that 
Df(x,y) rj a /(a;)||x — y\\ for nearby points. We set 
ri ow and <; as in the sampling lemma and Proposition 
[6] (these are properties of kNN graphs and hold for any 
/). Proposition [6] says that in kNN graphs, x is con- 
nected to y with high probability iff \\x— y\\ =4\ r/q(x). 
The probabilistic argument and the criteria on choos- 
ing A are similar to Theorem [TJ 

First we show that D sp {x,y) =4\ Df(x,y). Consider 
the /-geodesic path 7* connecting x to y. Divide 
j* y into segments u<j — x, u±, itj, Ut+i = y such 
that D q (ui,Ui + i) = r low - 2<j for i = 0,...,t - 1 
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and D q (u t ,u t+ i) < ri ow — 2q (see Figure [3]). There 
exists a vertex vi near to it; such that w, and Uj+i 
are connected. We show that the length of the path 
x, Ui, Vt, y is approximately smaller than Df(x,y). 
From the path construction we have 

\\vi - ~\ \\iM - u i+ i\\ «a r/q(v,i). 

By summing up along the path we get 

D sp (x,y) < J2 t H\\ v i ~ v i+i\\) 

~a E l Mlk-"mll)~AEiM^-j) 

= Ei/OO^T) ~^ J2if( u i)\\ u i -u i+ i || 

From the adaptation of Lemma [5] we have 
Df(ui,u i+ i) «a - u i+ i||, which gives 

This shows that D sp (x,y) =^a Df(x,y). 

For the other way round, we use a technique different 
from Proposition [4j Denote the graph shortest path 
between x and y by ir : zq = as, £1, z s , z s +i = y. 
Consider n' as a continuous path in X correspond- 
ing to 7r. As in the previous part, divide 71 7 into 
segments ito = x,ui, ...,Ut,u t +i — y (see Figure p]). 
From D g (zi, =^a f and D q (v,i, tij+i) ~a f we have 
s i. Using this and the subadditivity of h we get 

D sp (x,y) = Ei KIN ^ ^+ill) E l MIK ^ u i+ t\\). 
To prove D sp (x,y) )p\ Df(x,y), we can write 

EiMIK-tti+ill) ~a E^(^-)) = E t 

~A Ei/C^lki-^i+lll 

~a E l D f (ui, Ui+i) > D f (x, y). 

□ 

The proof of Theorem |8] is a direct consequence of 
this theorem. It follows by choosing h(t) = tf(r d /t d ) 
(which is subadditive if / is increasing) and setting 

wi^hqXi-XjW). 

4.2.2. SUPPERADDITIVE WEIGHTS 

A function h is called superadditive if Va;, y > : 
h(x) + h(y) < h(x + y). Examples are f(x) = x a ;a > 1 
and f(x) = xe x . To get an intuition on the behav- 
ior of the ST for a superadditive h, take an exam- 
ple of three vertices x, y, z which are all connected 
in the graph and sit on a straight line such that 
\\ x — v\\ + \\v ~ z \\ = ll^ — z \\- By the superadditiv- 
ity, the SV between x and z will prefer going through 
y rather than directly jumping to z. More generally, 
the graph SV will prefer taking many "small" edges 



rather than fewer "long" edges. For this reason, we 
do not expect a big difference between superadditive 
weighted kNN graphs and e-graphs: the long edges in 
the kNN graph will not be used anyway. However, due 
to technical problems we did not manage to prove a 
formal theorem to this effect. 

The special case of the superadditive family h(x) = x a , 
a > 1 is treated in |Hwang fc Hero ( 2012[ ) by com- 
pletely different methods. Although their results are 
presented for complete graphs, we believe that it can 
be extended to e and kNN graphs. We are not aware 
of any other result for the limit of SV distance in the 
superadditive regime. 

5. Consequences in applications 

In this section we study the consequences of our re- 
sults on manifold embedding using Isomap and on a 
particular semi-supervised learning method. 

There are two cases where we do not expect a drastic 
difference between the SV in weighted and unweighted 
kNN graph: (1) If the underlying density p is close to 
uniform. (2) If the intrinsic dimensionality of our data 
d is high. The latter is because in the g-distance, the 
underlying density arises in the form of p(x) 1 ^ d , where 
the exponent flattens the distribution for large d. 

5.1. Isomap 

Isomap is a widely used method for low dimensional 
manifold embedding (Tenenbaum et al. 2000). The 



main idea is to use metric multidimensional scaling on 
the matrix of pairwise geodesic distances. Using the 
Euclidean length of edges as their weights will lead 
to the convergence of the SV distance to the geodesic 
distance. But what would be the effect of applying 
Isomap to unweighted graphs? 

Our results of the last section already hint that there 
is no big difference between unweighted and weighted 
e-graphs for Isomap. However, the case of kNN graphs 
is different because weighted and unweighted shortest 
paths measure different quantities. The effect of ap- 
plying Isomap to unweighted kNN graphs can easily be 
demonstrated by the following simulation. We sample 
2000 points in IR 2 from a distribution that has two uni- 
form high-density squares, surrounded by a uniform 
low density region. An unweighted kNN graph is con- 
structed with k = 10, and we apply Isomap with target 
dimension 2. The result is depicted in Figure [2j We 
can see that the Isomap embedding heavily distorts 
the original data: it stretches high density regions and 
compacts low density regions to make the vertex dis- 
tribution close to uniform. 
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5.2. Semi-supervised learning 

Our work has close relationship to some of the litera- 
ture on semi-supervised learning (SSL). In regulariza- 
tion based approaches, the underlying density is either 
exploited implicitly as attempted in Laplacian regular- 
ization JZhuetaL 



20031 but see Nadler et al 



Ala mgir fc von Luxburg} |2011| and Zhou k, Belkin 



2009 



2011), or more explicitly as in measure based regu- 



larization ( Bousquet et al. , 2004 1 . Alternatively, one 



defines new distance functions on the data that take 
the density of the unlabeled points into account. Here, 



the papers by Sajama & Orlitsky (2005) and |Bijral 



et al. (2011) are most related to our paper. Both pa- 



pers suggest different ways to approximate the density 
based distance from the data. In [Sajama & Orlitsky 
(2005) it is achieved by estimating the underlying den- 
sity while inlBijral et al. (2011), the authors omit the 



density estimation and use an approximation. 

Our work shows a simpler way to converge to a similar 
distance function for a specific family of /-distances, 
namely constructing a kNN graph and assigning edge 
weights as in Equation [T] 

6. Conclusions and outlook 

We have seen in this paper that the shortest path dis- 
tance on unweighted kNN graphs has a very funny 
limit behavior: it prefers to go through regions of 
low density and even takes large detours in order to 
avoid the high density regions. In hindsight, this 
result seems obvious, but most people are surprised 
when they first hear about it. In particular, we be- 
lieve that it is important to spread this insight among 
machine learning practitioners, who routinely use un- 
weighted kNN-graphs as a simple, robust alternative 
to e-graphs. 

In some sense, unweighted e-graphs and unweighted 
kNN graphs behave as "duals" of each other: while 
degrees in e-graphs reflect the underlying density, they 
are independent of the density in kNN graphs. While 
the shortest path in e-graphs is independent of the 
underlying density and converges to the Euclidean dis- 
tance, the shortest paths in kNN graphs take the den- 
sity into account. 

Current practice is to use e and kNN graphs more or 
less interchangeably in many applications, and the de- 
cision for one or the other graph is largely driven by 
robustness or convenience considerations. However, as 
our results show it is important to be aware of the 
implicit consequences of this choice. Each graph car- 
ries different information about the underlying density, 
and depending on how a particular machine learning 



algorithms makes use of the graph structures, it might 
either miss out or benefit from this information. 
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